home *** CD-ROM | disk | FTP | other *** search
-
-
-
- DDDDDDDDOOOOPPPPTTTT((((1111)))) DDDDDDDDOOOOPPPPTTTT((((1111))))
-
-
-
- NNNNAAAAMMMMEEEE
- ddopt - MIPS Data-Dependency-based Optimizer
-
- SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
- ddddddddoooopppptttt _u_n_o_p_t__f_i_l_e _o_p_t__f_i_l_e [ ----vvvv ----mmmmiiiippppssss3333 ----hhhhoooossssttttccccaaaacccchhhheeee ----ccccaaaacccchhhheeeesssszzzz ssssiiiizzzzeeee ]
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- ddddddddoooopppptttt,,,, the MIPS _d_a_t_a-_d_e_p_e_n_d_e_n_c_y-_b_a_s_e_d optimizer, reads the input binary
- ucode file on a procedure by procedure basis, performs loop-based
- transformations on each outer-most loop nest in each procedure and
- outputs the optimized binary ucode file. By convention, it takes a
- binary ucode file with the extensions .B or .M as input and output a
- binary ucode file with the extension .D. In the compilation process,
- ddddddddoooopppptttt runs after the front-end, after uld and usplit, and before umerge,
- uopt and ugen. Currently, ddddddddoooopppptttt only takes ucode files generated from
- FORTRAN.
-
- ddddddddoooopppptttt borrows optimization techniques that originated from compilers for
- supercomputers and adapts them to apply to scalar machines. It performs
- high-level analysis on the behavior of array accesses in loops, deriving
- what we call data dependency information. Numerous optimization
- transformations on the program code are performed based on such
- information (and thus the name ddddddddoooopppptttt )))).... The transformations are
- invariantly associated with program loops that operate on arrays.
-
- There are different kinds of transformations performed by ddddddddoooopppptttt that
- benefit program performance:
-
- 1. Those that reduce memory references. Techniques include re-using
- array references that have been allocated to register (register
- allocation for array references) and moving array references and
- assignments outside loops.
-
- 2. Those that improve locality of memory references (thus reducing data
- cache misses). Techniques include changing the order of loop nests (loop
- interchange) and partitioning loop iterations to operate on smaller
- sections of array (strip-mining).
-
- 3. Those that reduce floating-point interlocks and promote greater
- parallelism among floating-point operations by promoting larger pieces of
- straight-line code in loops. Techniques include unrolling and
- unrolling-and-jam (unroll outer loop and jam the resulting copies of the
- inner loop into one bigger loop).
-
- There are other optimizations that ddopt does just to bring in more
- opportunities for doing the above transformations: local common
- subexpression, secondary index variable elimination, constant
- propagation, copy propagation, constant folding, jump folding and dead
- code elimination. Some of these optimizations duplicate the
- optimizations performed in uuuuoooopppptttt .... These optimizations are applied
- iteratively until there is no more change to the code, and they precedes
- the data-dependency-based analyses and transformations.
-
-
-
- PPPPaaaaggggeeee 1111
-
-
-
-
-
-
- DDDDDDDDOOOOPPPPTTTT((((1111)))) DDDDDDDDOOOOPPPPTTTT((((1111))))
-
-
-
- The following options are interpreted by ddddddddoooopppptttt.... Options starting with ----XXXX
- are not recognized by the compiler driver, and have to be passed to ddopt
- via ----WWWWdddd,,,,............ ....
-
- ----vvvv Turns on verbose mode. In this mode, ddopt will print the name
- of the procedure it is currently optimizing.
-
- ----mmmmiiiippppssss3333 Tells ddddddddoooopppptttt that the target machine uses the MIPS3 instruction
- set.
-
- ----hhhhoooossssttttccccaaaacccchhhheeee
- Tells ddddddddoooopppptttt to assume that the target machine has the same data
- cache size as the host machine, so it can find out the cache size
- via system call.
-
- ----ccccaaaacccchhhheeeesssszzzz ssssiiiizzzzeeee
- Gives ddddddddoooopppptttt the data cache size of the target machine, in bytes.
- The default is 8192 bytes.
-
- ----XXXXbbbbllllddddggggrrrr Dumps the data dependency information computed, for debugging
- purpose.
-
- ----XXXXbbbbbbbbooooppppttttooooffffffff
- Turns off the conventional global optimizations that precede the
- data-dependency-related transformations.
-
- ----XXXXbbbbffff ssssiiiizzzzeeee
- Changes the blocking factor used by ddddddddoooopppptttt in strip-mining. The
- default is 36 bytes.
-
- ----XXXXdddduuuummmmpppp Tells ddddddddoooopppptttt to dump the original and transformed program in a
- compact, close-to-source-level format.
-
- ----XXXXddddoooossssiiiizzzzeeeetttthhhhrrrreeeesssshhhhoooolllldddd ccccoooouuuunnnntttt
- If the number of statements in a DO loop exceeds this number,
- that DO loop is excluded from transformation by ddddddddoooopppptttt.... The
- default is 150.
-
- ----XXXXggggccccooooppppyyyyooooffffffff
- Turns off global copy propagation.
-
- ----XXXXiiiinnnntttteeeerrrrooooffffffff
- Turns off loop interchange.
-
- ----XXXXiiiinnnnddddeeeepppprrrreeeeggggooooffffffff
- Turns off loop-independent dependence register allocation.
-
- ----XXXXiiiinnnnppppuuuuttttrrrreeeeggggooooffffffff
- Turns off input dependence register allocation.
-
-
-
-
-
-
- PPPPaaaaggggeeee 2222
-
-
-
-
-
-
- DDDDDDDDOOOOPPPPTTTT((((1111)))) DDDDDDDDOOOOPPPPTTTT((((1111))))
-
-
-
- ----XXXXiiiinnnnvvvvaaaarrrrrrrreeeeggggooooffffffff
- Turns off loop-invariant register allocation.
-
- ----XXXXllllccccooooppppyyyyooooffffffff
- Turns off local copy propagation.
-
- ----XXXXmmmmeeeerrrrggggeeeeppppiiiibbbblllloooocccckkkkooooffffffff
- Disallows the merging of pi-blocks created for statements in the
- same basic blocks.
-
- ----XXXXmmmmoooorrrreeeeuuuunnnnrrrroooolllllllljjjjaaaammmm
- By default, unroll-and-jam are performed only on inner loop nests
- that come out of strip-mining. This flag removes this restriction
- and tells ddddddddoooopppptttt to do unroll-and-jam whenever it thinks it is
- advantageous.
-
- ----XXXXmmmmaaaaxxxx____iiiinnnntttt____rrrreeeeggggssss
- Tells ddddddddoooopppptttt the number of integer registers available in the
- underlying machine. The default is 32.
-
- ----XXXXmmmmaaaaxxxx____ffffllllooooaaaatttt____rrrreeeeggggssss
- Tells ddddddddoooopppptttt the number of floating-point registers available in
- the underlying machine. The default is 16.
-
- ----XXXXooooffffffffffffoooooooo
- Turns off all transformation for the given procedure name ("foo"
- in this case).
-
- ----XXXXoooouuuuttttppppuuuuttttrrrreeeeggggooooffffffff
- Turns off output dependence register allocation.
-
- ----XXXXoooovvvveeeerrrraaaallllllllooooccccaaaatttteeee
- Tells ddddddddoooopppptttt to perform register allocation without regard to the
- number of registers available in the underlying machine.
-
- ----XXXXssssttttrrrriiiippppooooffffffff
- Turns off strip-mining.
-
- ----XXXXssssttttrrrriiiippppoooonnnnllllyyyy
- Tells ddddddddoooopppptttt to perform strip-mining but prevent the newly-formed
- loops from being interchanged into a deeper region of the loop
- nest, for debugging purpose only.
-
- ----XXXXssssttttaaaatttt Prints optimization statistics to give line numbers and number of
- times various transformations were applied.
-
- ----XXXXttttrrrruuuueeeerrrreeeeggggooooffffffff
- Turns off true dependence register allocation.
-
- ----XXXXuuuunnnnrrrroooollllllllooooffffffff
- Turns off loop unrolling.
-
-
-
-
- PPPPaaaaggggeeee 3333
-
-
-
-
-
-
- DDDDDDDDOOOOPPPPTTTT((((1111)))) DDDDDDDDOOOOPPPPTTTT((((1111))))
-
-
-
- ----XXXXuuuunnnnrrrroooolllllllljjjjaaaammmmooooffffffff
- Turns off unroll-and-jam.
-
- ----XXXXuuuunnnnrrrroooolllllllltttthhhhrrrreeeesssshhhhoooolllldddd ccccoooouuuunnnntttt
- Sets the threshold that limits the extent to which unrolling can
- be performed without causing the number of statements in the loop
- to exceed this number. The default is 180.
-
- ----XXXXuuuunnnnrrrroooollllllllttttiiiimmmmeeeessss ccccoooouuuunnnntttt
- Sets the maximum number of times to unroll a loop. The default
- is 4.
-
- SSSSEEEEEEEE AAAALLLLSSSSOOOO
- _u_c_o_d_e(1), _u_o_p_t(1), _b_t_o_u(1), _p_p_u(1),
-
- DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS
- ddddddddoooopppptttt assumes the input ucode file is error-free.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- PPPPaaaaggggeeee 4444
-
-
-
-